import numpy as np
import plotly.express as px
def markRecapture(P, m, S):
population = [0] * P
for i in range(m):
population[np.random.randint(0, P)] = 1
r = 0
for i in range(S):
if population[i] == 1:
r += 1
if r == 0:
r = 1
return (m * S) / r
P = 500
m = 80
S = 90
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
$\hat{P}$ tends to overestimate.
P = 1000
m = 300
S = 200
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
P = 4000
m = 500
S = 500
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
P = 4000
m = 500
S = 200
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
P = 4000
m = 300
S = 600
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
P = 4000
m = 2000
S = 2000
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
P = 400000
m = 300
S = 350
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
This is where I got a "Division by zero" error. I did what Elizabeth said and made r = 1 if r == 0. These estimates are very far below true value, though. Let's see if it's m or S that makes the difference.
P = 400000
m = 300
S = 10000
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
With a large enough S, the algorithm again begins to overestimate, even with m being a relatively small number.
P = 400000
m = 30000
S = 350
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
Interestingly, with a high m and a low S, the algorithm also overestimates.
Let's see what happens when P, m, and S are all equal.
P = 4000
m = 4000
S = 4000
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
This is interesting as well. The algorithm significantly overestimates the population in this case, more so than usual for a population of this size. This just further serves to underscore the fact that the equation itself is skewed and changing the inputs will not fix the problem.
P = 30
m = 10
S = 10
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
P = 100
m = 20
S = 30
N = 10000
results = [markRecapture(P, m, S) for i in range(N)]
df = results
fig = px.histogram(df, nbins=100)
fig.show()
fig = px.box(df)
fig.show()
Even with very small populations, this overestimates significantly.
To conclude, this function tends to overestimate the true population and also skews to the right significantly. This is not a good way to determine the true size of a population.